Added initial implementation of kosli snapshot cloud-run server/#4986#833
Added initial implementation of kosli snapshot cloud-run server/#4986#833ToreMerkely wants to merge 11 commits intomainfrom
Conversation
ToreMerkely
commented
Apr 28, 2026
- feat(snapshot): add hidden cloud-run skeleton command (#4986)
- feat(cloudrun): add internal/cloudrun package wrapping the Cloud Run Admin API (#4986)
- feat(snapshot): wire cloud-run command end-to-end via internal/cloudrun (#4986)
- feat(snapshot): add service filtering flags to cloud-run command (#4986)
- feat(cloudrun): friendly auth and not-found error messages (#4986)
- fix(cloudrun): align snapshot payload with server's snapshot-examples doc (#4986)
|
Claude finished @ToreMerkely's task in 2m 19s —— View job PR Review:
|
Registers `kosli snapshot cloud-run ENVIRONMENT-NAME` as a hidden, in-development command for Google Cloud Run. The command takes required `--project` and `--region` flags, forces dry-run mode, and prints a placeholder line — no GCP API calls or HTTP requests yet. This is the first slice of the Cloud Run feature; subsequent slices will add the GCP client wrapper, the end-to-end happy path, filtering, auth UX, and finally unhide the command. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Admin API (#4986) Slice 2 of the snapshot cloud-run feature. Adds a small, unit-tested wrapper around cloud.google.com/go/run/apiv2 that lists services in a GCP project + region and, for each service, returns one Revision per entry in the service's traffic configuration (any percent including 0%). TrafficTargetAllocationType_LATEST resolves via the service's LatestReadyRevision, and revisions referenced more than once are deduped. Digest extraction mirrors the ECS fallback in internal/aws/aws.go: use a @sha256: substring when present, else leave the digest empty rather than calling a registry. Production code reaches GCP through an unexported apiClient seam so tests substitute a fake without touching ADC. No command wiring yet; that's the next slice. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…un (#4986) Slice 3 of the snapshot cloud-run feature. The command now calls cloudrun.New + ListServices, flattens the result into an EnvRequest payload, and submits a PUT to /report/cloud-run via kosliClient.Do. The PreRunE-forced dry-run keeps the request from actually leaving the client, so the (still-missing) server-side endpoint is not on the path yet. Each artifact carries the GCP project and region in addition to service_name, so revision rows are self-describing — a small extension of the EcsEnvRequest shape. The command depends on a local cloudRunLister interface and a package- level newCloudRunClient variable, letting tests swap in a stub without touching ADC; integration was sanity-checked against the real hello-world-cli-demo project and produced the expected digest-pinned artifact. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Slice 4 of the snapshot cloud-run feature. Adds --services, --services-regex, --exclude, and --exclude-regex, mirroring the ECS service filtering shape and reusing the existing filters.ResourceFilterOptions struct. PreRunE rejects the four include/exclude mutex pairs. Filtering is applied in the command after cloudrun.ListServices returns. Services excluded by name still incur their per-revision API round-trips; pushing the filter into the GCP wrapper to skip those calls is a tractable follow-up if the round-trip cost becomes a bottleneck. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Slice 6 of the snapshot cloud-run feature. cloudrun.Classify maps gRPC Unauthenticated / PermissionDenied / NotFound responses into actionable messages and preserves the underlying error via %w; other codes pass through. The Unauthenticated message names all three credential sources (env var, gcloud command, GCE/GKE metadata server or Workload Identity) because the production deployment is a cluster cron job, not a local-dev gcloud session. Classification lives in internal/cloudrun (the package owns GCP knowledge) but is applied at the command boundary, not inside Client.ListServices. Doing it inside the Client would double-wrap real-call errors when the command also classified them, and it would bypass the stubbed test path entirely. Calling Classify once at the command boundary covers both real and stub error sources. cloudrun.New now wraps construction errors with a generic "GCP client setup failed" prefix; the cluster case rarely fails here, and the SDK message is preserved for diagnosis. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… doc (#4986) Three changes brings the payload into line with the conventions documented in the server repo's out-snapshot-examples.txt: - Added the top-level "type": "cloud-run" literal that every other env-type report ships explicitly. Without it, the (still-to-come) CloudRunReport model on the server would only accept the URL default; with it, the request is unambiguous. - Renamed the per-artifact "service_name" to "serviceName" (camelCase) matching the doc's K8S/ECS examples. The existing CLI's ECS code uses snake_case, but a new type is better off following the doc. - Dropped the per-artifact "project" and "region" fields. The doc notes extra="forbid" on every Pydantic model, so unilaterally picking custom field names risks 422 once the server defines its CloudRunReport. Project + region are already in the URL and flags, mirroring how ECS reports don't carry account/region per artifact. Verified end-to-end against the hello-world-cli-demo project; payload now matches the doc shape exactly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
48c4c0e to
fa367b3
Compare
…(#4986) The serviceName JSON tag was originally chosen to match the server's out-snapshot-examples doc, but the existing CLI's ECS code sends cluster_name/service_name on the wire and is in production today; Azure does the same with app_name/app_kind/digests_source. The doc disagrees with reality, so align with reality. The wire convention across types now is: unique-ID field camelCase (revisionName, taskArn, podName), grouping field snake_case (service_name, cluster_name, app_name), digests + creationTimestamp universal. Also numbered the cmdTestCase entries in TestSnapshotCloudRunCmd (01-09) to match the convention from attestSonar_test.go, which makes individual cases easier to spot when scanning suite output. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR review feedback. Two issues: cloudrun.New opened gRPC services and revisions clients but never closed them. For a one-shot CLI invocation the OS reclaims the sockets, but the gRPC library logs noisy "clientConn switching balancer" warnings on stderr and the keepalive goroutines outlive the work. There was also a partial-failure leak: if NewRevisionsClient returned an error after services succeeded, services was abandoned. Adds Client.Close() releasing both connections (with the second attempted even when the first errors) and a cleanup of services in the partial-failure path. The command defers Close via an io.Closer type-assert, so the test stub does not need to implement it. Also replaced `filtered := services[:0]` (correct in-place reuse but a subtle idiom — relies on range copying each element before the overwrite via append) with `make([]cloudrun.Service, 0, len(services))` for clarity. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| // RevisionData represents one Cloud Run revision in the snapshot payload. | ||
| // One artifact is emitted per revision in each service's traffic configuration. | ||
| type RevisionData struct { | ||
| RevisionName string `json:"revisionName"` |
There was a problem hiding this comment.
Nit (naming consistency): revisionName (camelCase) and service_name (snake_case) in the same struct is unusual. I see from the handover doc and the ECS precedent (taskArn + service_name) that this is a deliberate choice to match the existing wire format. Just flagging for anyone reading the code fresh — it looks like a typo but is intentional.
No change needed.
| // Close releases the underlying gRPC connections. Safe to call on a Client | ||
| // constructed with a fake apiClient (returns nil). Returns the first error | ||
| // from closing either client; the second is always attempted. | ||
| func (c *Client) Close() error { |
There was a problem hiding this comment.
Minor: Close() silently drops the second error.
If both services.Close() and revisions.Close() fail, only sErr is returned — rErr is silently dropped. For a CLI that's about to exit this is low-impact, but errors.Join (Go 1.20+) would preserve both:
| // Close releases the underlying gRPC connections. Safe to call on a Client | |
| // constructed with a fake apiClient (returns nil). Returns the first error | |
| // from closing either client; the second is always attempted. | |
| func (c *Client) Close() error { | |
| func (c *Client) Close() error { | |
| g, ok := c.api.(*gcpAPI) | |
| if !ok { | |
| return nil | |
| } | |
| return errors.Join(g.services.Close(), g.revisions.Close()) | |
| } |
This also simplifies the function. Low priority — the current version works fine for a short-lived CLI process.
|
|
||
| const snapshotCloudRunShortDesc = `Report a snapshot of running services in a Google Cloud Run project and region to Kosli. ` | ||
| const snapshotCloudRunLongDesc = snapshotCloudRunShortDesc + ` | ||
| Currently a hidden, in-development command — it always runs in dry-run mode regardless of the --dry-run flag.` |
There was a problem hiding this comment.
Minor: LongDesc still says "it always runs in dry-run mode".
This is correct for now since the command is hidden and DryRun is forced to true in PreRunE. Just a reminder that Slice 7 (unhide + lift dry-run) will need to update this string. No change needed now.
Lifts the forced dry-run lock that's been on since Slice 1, so the
command actually PUTs the snapshot to /report/cloud-run. Adds the
ECS-style success log "[N] revisions were reported to environment X"
when the request succeeds outside dry-run.
Test changes:
- SetupTest now creates the env with type "cloud-run" via CreateEnv,
the same way ECS/K8S tests do. Confirms the local server's
CloudRunReport endpoint accepts the new env-type literal.
- The dry-run-asserting tests (cmdTestCase 05 and the four
TestSnapshotCloudRunFilter_* cases) take explicit --dry-run since
the global default is no longer forced.
- New TestSnapshotCloudRunCmd_HappyPathReportsToServer exercises the
full CLI → local Kosli server roundtrip with the GCP client
stubbed, asserting the success log line.
- Stub fixtures use 64-char hex digests because the server's
CloudRunReport validates `^[a-f0-9]{64}$` per digest value.
Hidden: true stays for now; unhiding + CLI reference docs land in
the next slice.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| func (c *Client) Close() error { | ||
| g, ok := c.api.(*gcpAPI) | ||
| if !ok { | ||
| return nil | ||
| } | ||
| sErr := g.services.Close() | ||
| rErr := g.revisions.Close() | ||
| if sErr != nil { | ||
| return sErr | ||
| } | ||
| return rErr | ||
| } |
There was a problem hiding this comment.
Minor: Close() silently drops the second error.
As noted in the previous review round, if both services.Close() and revisions.Close() fail, rErr is silently dropped. Since you're on Go 1.25, errors.Join is available and simplifies this:
| func (c *Client) Close() error { | |
| g, ok := c.api.(*gcpAPI) | |
| if !ok { | |
| return nil | |
| } | |
| sErr := g.services.Close() | |
| rErr := g.revisions.Close() | |
| if sErr != nil { | |
| return sErr | |
| } | |
| return rErr | |
| } | |
| func (c *Client) Close() error { | |
| g, ok := c.api.(*gcpAPI) | |
| if !ok { | |
| return nil | |
| } | |
| return errors.Join(g.services.Close(), g.revisions.Close()) | |
| } |
(Requires adding "errors" to the import block.)
Low impact for a short-lived CLI, but it's a one-line improvement and preserves both errors for diagnostics.